Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

125 fix germline variant calling #128

Merged
merged 7 commits into from
Sep 25, 2023
Merged

Conversation

evelyn-schmidt
Copy link
Member

This PR is related to issue #125

Enables the user to override the default for filtering of germline SNPs

Copy link
Member

@chrisamiller chrisamiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this generally seems fine, and solves the immediate problem as I understand it, as long as the inputs are set properly (see above comment about defaults).

Has this actually been tested on some data and shown to have the desired effect?

I do wonder if this germline calling should be reviewed and brought up to date with the latest and greatest GATK recommendations, but that's a separate issue!

@@ -199,6 +199,7 @@ workflow immuno {
Array[Array[String]] gatk_haplotypecaller_intervals
Int? ploidy
String? optitype_name
Float germline_filter_gnomAD_maximum_population_allele_frequency
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should immuno go ahead and set a default at the top level of 1.0 so that no variants get filtered?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing now! I left germline_filter_gnomAD_maximum_population_allele_frequency set to 0.05 in germline_detect_variants. Would it be more correct style wise to set that value at the top level and have it be passed down?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our general style is to define the defaults as low down as possible (at the tool level), and then pass parameters upward as needed so that they can be overridden (with the override being set as high up as possible, so we don't have to go digging through 8 layers to find it). In the top-level immuno wdl, I think it would be reasonable to set that value such that it always passes all germline variants, since various steps in that workflow depend on it.

@evelyn-schmidt
Copy link
Member Author

The change has been tested with germline_filter_gnomAD_maximum_population_allele_frequency set at three separate values: 1.1 (which results in no filtering), .05, and .01 (which results in extreme filtering).

To verify that these changes took place, I viewed the intermediate files between the gnomadFrequencyFilter step and the setFilteredVcfName (within germline_filter_vcf.wdl). After investigating these files I found that the run with the filter of 1.1 resulted in files that were the exact same size with the same values (as expected) while the run with the filter of 0.01 had a file that was significantly smaller in size than the pre-filtering file, removing all entries with a gnomAD AF above 0.01.

Users can now set the value within the yaml file or choose not to set it, in which the default value will be 1.1 resulting in no filtering.

Copy link
Member

@tmooney tmooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a few files in this PR ended up with mode changes.

@evelyn-schmidt
Copy link
Member Author

evelyn-schmidt commented Aug 24, 2023

Looks like a few files in this PR ended up with mode changes.

Those files have now been fixed.

Copy link
Member

@tmooney tmooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 📇

@malachig malachig merged commit f27f9df into main Sep 25, 2023
2 checks passed
@malachig malachig deleted the 125-fix-germline-variant-calling branch October 15, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix germline variant calling inputs for the pvactool proximal variation analysis
4 participants